Monroe County
Chain of Explanation: New Prompting Method to Generate Higher Quality Natural Language Explanation for Implicit Hate Speech
Huang, Fan, Kwak, Haewoon, An, Jisun
The potential of sequence-to-sequence (Seq2Seq) models and prompting Recent studies have exploited advanced generative language models methods has not been fully explored [4]. Moreover, traditional evaluation to generate Natural Language Explanations (NLE) for why a certain metrics, such as BLEU [20] and Rouge [18], applied in NLE text could be hateful. We propose the Chain of Explanation (CoE) generation for hate speech, may also not be able to comprehensively Prompting method, using the heuristic words and target group, to capture the quality of the generated explanations because they generate high-quality NLE for implicit hate speech. We improved heavily rely on the word-level overlaps [3]. To fill those gaps, we the BLUE score from 44.0 to 62.3 for NLE generation by providing propose a Chain of Explanations (CoE) prompt method to generate accurate target information. We then evaluate the quality of generated high-quality NLE distinguishing the implicit hate speech from nonhateful NLE using various automatic metrics and human annotations tweets.
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech
Huang, Fan, Kwak, Haewoon, An, Jisun
Recent studies have alarmed that many online hate speeches are implicit. With its subtle nature, the explainability of the detection of such hateful speech has been a challenging problem. In this work, we examine whether ChatGPT can be used for providing natural language explanations (NLEs) for implicit hateful speech detection. We design our prompt to elicit concise ChatGPT-generated NLEs and conduct user studies to evaluate their qualities by comparison with human-written NLEs. We discuss the potential and limitations of ChatGPT in the context of implicit hateful speech research.
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Grauman, Kristen, Westbury, Andrew, Byrne, Eugene, Chavis, Zachary, Furnari, Antonino, Girdhar, Rohit, Hamburger, Jackson, Jiang, Hao, Liu, Miao, Liu, Xingyu, Martin, Miguel, Nagarajan, Tushar, Radosavovic, Ilija, Ramakrishnan, Santhosh Kumar, Ryan, Fiona, Sharma, Jayant, Wray, Michael, Xu, Mengmeng, Xu, Eric Zhongcong, Zhao, Chen, Bansal, Siddhant, Batra, Dhruv, Cartillier, Vincent, Crane, Sean, Do, Tien, Doulaty, Morrie, Erapalli, Akshay, Feichtenhofer, Christoph, Fragomeni, Adriano, Fu, Qichen, Fuegen, Christian, Gebreselasie, Abrham, Gonzalez, Cristina, Hillis, James, Huang, Xuhua, Huang, Yifei, Jia, Wenqi, Khoo, Weslie, Kolar, Jachym, Kottur, Satwik, Kumar, Anurag, Landini, Federico, Li, Chao, Li, Yanghao, Li, Zhenqiang, Mangalam, Karttikeya, Modhugu, Raghava, Munro, Jonathan, Murrell, Tullie, Nishiyasu, Takumi, Price, Will, Puentes, Paola Ruiz, Ramazanova, Merey, Sari, Leda, Somasundaram, Kiran, Southerland, Audrey, Sugano, Yusuke, Tao, Ruijie, Vo, Minh, Wang, Yuchen, Wu, Xindi, Yagi, Takuma, Zhu, Yunyi, Arbelaez, Pablo, Crandall, David, Damen, Dima, Farinella, Giovanni Maria, Ghanem, Bernard, Ithapu, Vamsi Krishna, Jawahar, C. V., Joo, Hanbyul, Kitani, Kris, Li, Haizhou, Newcombe, Richard, Oliva, Aude, Park, Hyun Soo, Rehg, James M., Sato, Yoichi, Shi, Jianbo, Shou, Mike Zheng, Torralba, Antonio, Torresani, Lorenzo, Yan, Mingfei, Malik, Jitendra
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception. Project page: https://ego4d-data.org/
Data science could reshape climate change disaster response
A major wildfire spread through Colorado, and I spent long hours locating shelters, identifying evacuation routes and piecing together satellite imagery. As the Fourmile Canyon Fire devastated areas to the west of Boulder, ultimately destroying 169 homes and causing $217 million in damage, my biggest concerns were ensuring that people could safely evacuate and first responders had the best chance of keeping the fire at bay. I spent it sitting comfortably in my home in Bloomington, Indiana, a thousand miles away from the action. I was a volunteer, trying to help fire victims. I had created a webpage to aggregate data about the fire, including the location of shelters and the latest predictions of fire spread.
Data science could help Californians battle future wildfires -- GCN
A major wildfire spread through Colorado, and I spent long hours locating shelters, identifying evacuation routes and piecing together satellite imagery. As the Fourmile Canyon Fire devastated areas to the west of Boulder, ultimately destroying 169 homes and causing US$217 million in damage, my biggest concerns were ensuring that people could safely evacuate and first responders had the best chance of keeping the fire at bay. I spent it sitting comfortably in my home in Bloomington, Indiana, a thousand miles away from the action. I was a volunteer, trying to help fire victims. I had created a webpage to aggregate data about the fire, including the location of shelters and the latest predictions of fire spread.
Machine Learning for Text Analytics is Getting a Boost
BLOOMINGTON, Ind., Oct. 22, 2019 (GLOBE NEWSWIRE) -- Megaputer Intelligence, Inc. will share an innovative new tool for building training datasets for use in machine learning during a presentation at the Text Analytics Forum '19 held in Washington, DC on November 7. Dr. Sergei Ananyan, CEO of Megaputer Intelligence, Inc., will present a cutting-edge topic entitled, "NLP & Rule-Based Approach for Fact Extraction: Launchpad for Machine Learning Techniques" on Thursday, November 7 at 11:15 AM EST. The Text Analytics Forum will host the presentation at the JW Marriott in Washington, DC as part of its comprehensive programming, running from Nov 4-7. The content of the presentation is designed for people interested in discovering how to achieve higher accuracy from machine learning, relieve the burden of needing experts to manually create a gold standard training dataset, and illuminate the black box surrounding machine learning as much as possible with insight into today's latest technological advances. Professionals such as text analysts, data scientists, DBAs, information knowledge architects, knowledge organizers, taxonomists, ontologists, CIOs, CKOs, research scientists, and data quality managers will benefit greatly from this technique to overcome well-known challenges of machine learning. One fundamental obstacle for using machine learning (ML) to accurately extract facts from free-text documents is that it requires huge quantities of pre-categorized data for training a model.
Yelp Food Identification via Image Feature Extraction and Classification
Sun, Fanbo, Gu, Zhixiang, Feng, Bo
Yelp has been one of the most popular local service search engine in US since 2004. It is powered by crowd-sourced text reviews and photo reviews. Restaurant customers and business owners upload photo images to Yelp, including reviewing or advertising either food, drinks, or inside and outside decorations. It is obviously not so effective that labels for food photos rely on human editors, which is an issue should be addressed by innovative machine learning approaches. In this paper, we present a simple but effective approach which can identify up to ten kinds of food via raw photos from the challenge dataset. We use 1) image pre-processing techniques, including filtering and image augmentation, 2) feature extraction via convolutional neural networks (CNN), and 3) three ways of classification algorithms. Then, we illustrate the classification accuracy by tuning parameters for augmentations, CNN, and classification. Our experimental results show this simple but effective approach to identify up to 10 food types from images.
A Practical Algorithm for Distributed Clustering and Outlier Detection
Chen, Jiecao, Azer, Erfan Sadeqi, Zhang, Qin
We study the classic k-means/median clustering, which are fundamental problems in unsupervised learning, in the setting where data are partitioned across multiple sites, and where we are allowed to discard a small portion of the data by labeling them as outliers. We propose a simple approach based on constructing small summary for the original dataset. The proposed method is time and communication efficient, has good approximation guarantees, and can identify the global outliers effectively. To the best of our knowledge, this is the first practical algorithm with theoretical guarantees for distributed clustering with outliers. Our experiments on both real and synthetic data have demonstrated the clear superiority of our algorithm against all the baseline algorithms in almost all metrics.
Who needs democracy when you have data?
In 1955, science fiction writer Isaac Asimov published a short story about an experiment in "electronic democracy," in which a single citizen, selected to represent an entire population, responded to questions generated by a computer named Multivac. The machine took this data and calculated the results of an election that therefore never needed to happen. Asimov's story was set in Bloomington, Indiana, but today an approximation of Multivac is being built in China. For any authoritarian regime, "there is a basic problem for the center of figuring out what's going on at lower levels and across society," says Deborah Seligsohn, a political scientist and China expert at Villanova University in Philadelphia. How do you effectively govern a country that's home to one in five people on the planet, with an increasingly complex economy and society, if you don't allow public debate, civil activism, and electoral feedback?